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SELF -GUIDED MOLECULAR DYNAMICS SIMULATIQN FOR 
" EFFIC IENT CONFORMATIONAL SEARCH a 

\^^> FIELD OF THE INVENTION 

The present invention relates to Molecular Dynamics based simulation 
5 techniques. In particular, the invention relates to a simulation method capable of 
generating trajectories of enhanced conformational space sampling. 

BACKGROUND OF THE INVENTION 
Computer simulation techniques provide invaluable tools for the study of 
physical, chemical and biological processes. These techniques are based on 
10 microscopic descriptions of molecular systems which are processed to provide 
macroscopic information useful in understanding physical, chemical and 
biological phenomena. In this regard, processing molecular data generated by 
computer simulation provides unique opportunities for designing new molecular 
systems tailored to specific needs. 
15 One popular computer simulation technique which employs microscopic 

description of a molecular system in predicting the system's macroscopic behavior 
is the Molecular dynamics (MD) technique. The MD simulation technique is a 
powerful computational tool which allows the exploration of structural, 
thermodynamic and kinetic properties of molecular systems. MD-based simulation 
20 methods have been successfully employed in the study systems varying from 
simple fluids to large proteins and DNA molecules. 

Molecular dynamics provides a methodology for detailed microscopic 
modeling on the molecular scale of a molecular system containing N components 
(TV-body). The conceptual basis for the physical relevance of data obtained 
25 through MD simulations is supported by the preposition that the nature of matter 
is to be found in the structure and motion of its constituent building blocks, and 
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therefore, the dynamic properties of a system are contained in the solution to the 
//-body problem. Given that the classical JV-body problem lacks a general 
analytical solution, the only path open is the numerical one. Scientists engaged in 
studying matter at this level require computational tools to allow them to follow 
5 the movement of individual molecules and it is this need that the molecular 
dynamics approach aims to fulfill. This requires massive amounts of computing 
resources. In the last decade, the use of MD simulations in the study of physical, 
chemical and biological problems has increased exponentially, primarily due to 

5=ssl 

the increased computing power of calculating machines. 
10 The Molecular Dynamics simulation technique has become a popular 

approach to study many systems in physics, chemistry, and biology and predict 
their properties. Conformational sampling and searching lays the fundamental 
ground for MD simulations. The major limitation associated with conventional 
MD simulations is their extremely inefficient conformational searching and 
15 sampling efficiency, which severely limits the applicability of MD simulation 
and/or accuracy when predicting the properties of a system. Therefore, MD 
simulations have been primarily applied to homogenous systems or to fast events 
in heterogeneous systems. In many problems such as protein folding, ligand 
binding, molecular docking, phase transition, the conformation space of the system 
20 is so large and the time scale of an event is so long that insight that can be 
obtained by conventional MD simulation techniques is very limited even with 
today's fastest computer. 

Molecular Dynamics techniques are based on the propagation of classical 
equations of motion. MD trajectories are generated by shifting the positions of the 
25 atoms forming the N-body system according to the forces exerted on each atom. 
At each new position, the interactions between a given atom and the remaining 
components of the molecular system are evaluated and derived to obtain the force 
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exerted on the atom at the given time step. That force is in turn employed in 
determining a new position for the atom. The position shifting and force 
evaluating is repeated along to form a trajectory. The trajectory is a collection of 
successive snapshots or frames separated in the temporal space by a time step of 

5 a given magnitude. The choice of a given time step is dictated by the nature of 
the system being simulated and is generally adjusted as a function of the 
complexity of the system as described, for example, by the complexity of the force 
field describing the interactions between the components of the simulated system 
and sometimes between those components and external elements. In general, the 

10 magnitude of a time step employed in a simulation is adjusted such that 
fluctuations in the energy of the simulated system between two successive time 
frames are small enough to allow the use of numerical approximations without 
drastically departing from purely theoretical trajectories. Acceptable time steps 
are generally a fraction of the time scale associated with significant changes in the 

15 topology of the energy surface from which the forces governing the motion being 
simulated are derived. Those changes are generally associated with the fastest 
movements. Therefore, the time step is generally adjusted to accurately describe 
the fastest movements in a molecular system. 

Motions in molecular systems take place on a wide range of time scales. 

20 Vibrations of covalent bonds take place in 10" 14 second, whereas large-scale 
conformational changes such as that in the protein folding process may take 
seconds or even hours. This large range of time scales presents MD simulations 
a great challenge. In an MD simulation, the classical equations of motion for all 
particles of a system are integrated over a certain period of time. Because of the 

25 limitation in the size of time steps used to solve the equations of motion, the time 
scale that an MD simulation can access is very short with current available 
computing power, typically in the range of pico-seconds (ps) to nano-seconds (ns). 
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This short time scale severely limits the applicability of MD simulations. To 
extend the applicability of MD simulations, a number of strategies have been used. 

One field of intense research is the study of biological phenomena by 
molecular dynamics simulations. These systems present opportunities for the most 

5 important and spectacular utilization of molecular simulation in providing insights 
which can rarely be accessed through actual experimentation. In particular, 
possibilities of designing new molecules active as bio-agents through simulation 
is an avenue of great potential. 

One of the biological problems that can be investigated by molecular 

10 dynamics simulation relates to employing the three dimensional structure of a 
protein in designing molecules that would bind specifically to the protein and 
thereby interfere with the activity of the protein. Since only a limited number of 
proteins having known tertiary structure are available, one important aspect in this 
regard relates to the prediction of a tertiary structure of a protein via computational 

1 5 methodology based solely on the amino acid sequence of said protein. Computer 
simulation techniques are important in this regard in view of the cost and time 
associated with the determination of such structure by experimental means such 
as NMR spectroscopy and x-ray crystallography which cannot be expected to keep 
pace with explosion of protein sequence information provided by DNA 

20 sequencing techniques. With the cost/performance of computational hardware 
being reduced by roughly a factor of two every eighteen months, and with 
improvements in software and algorithm development providing an even greater 
enhancement of computational efficiency, it is clear that the goal of computation- 
based structure determination will eventually be reached. 

25 However, there are still formidable problems, both conceptual and 

numerical, in making this objective a reality. These problems involve both the 
issue of the appropriate representation of the protein (most critically, the potential 
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functions that assign energies to any given protein structure) and the algorithmic 
issue of finding the native structure, starting from an unfolded chain, in a finite 
amount of computation time. These two central areas are of course intertwined: 
if computational expense was irrelevant, one could simply solve the Schrodinger 
5 equation for each protein configuration (averaging over all positions of the solvent 
molecules), thereby guaranteeing accurate evaluation of the potential energy. The 
coupled problem is then to have the total computation time required to obtaining 
the predicted structure, roughly the cost of evaluating the energy of each structure 
multiplied by the number of such evaluations, be tractable with current computing 

10 technology. The development of new potential functions and the investigation of 
improved methods for the minimization of functions (to reduce the number of 
evaluations of the potential function that are necessary) are therefore two key areas 
in computational studies of protein folding. 

The two most striking features of native protein structures in water are (a) 

1 5 the overwhelming tendency of charged or polar side chains to appear on the 
"exterior" of the protein (i.e. exposed to solvent) and hydrophobic side chains to 
be located in the "interior" of the protein (protected from solvent), forming a 
unique compact globular structure for a given sequence; (b) the formation of 
secondary structure segments, alpha-helices and beta-sheets, so as to satisfy 

20 backbone hydrogen bonds. In a naturally occurring globular protein, the pattern 
of hydrophobic and hydrophilic residues is such that the formation of a micellar 
structure, with the hydrophilic groups exposed to water and the hydrophobic 
groups buried in the interior, is possible. A central feature of proteins, as compared 
to other polymers or colloidal structures, is that this micellar formation must be 

25 compatible with the two secondary structure classes. If the backbone groups inside 
the protein cannot make a sufficient number of internal hydrogen bonds, the 
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protection of hydrophobic side chains from solvent will not compensate for the 
removal of the backbone carbonyl and amino moieties from aqueous solution. 

Simple topological arguments would then suggest that a relatively large 
fraction of the protein (-40-70%) must be in either an alpha-helix or a beta-sheet. 
5 Furthermore, each individual secondary structure region cannot be too long (or 
else the protein would lack the ability to fold into a compact globular structure) 
and the "loop" regions joining the secondary structure elements also cannot be too 
long (or else the backbone residues in the interior of this region would be have 
difficulty efficiently forming hydrogen bonds, as suggested above). The picture 
1 0 that then emerges is one of rigid rod-like segments (ignoring for the moment the 
twists and deformations of alpha-helices and beta-strands) joined by flexible loops, 
with each segment of length between 3 and 30 residues. This description in fact 
characterizes the vast majority of proteins whose structures are available in the 
protein data bank. 

15 It is well known from both theoretical and experimental results that 

secondary structure elements in proteins are marginally stable; indeed, helical and 
beta-sheet segments in proteins rarely retain their form when removed from the 
protein environment and studied as small peptides. Thus, protein folding is a 
highly cooperative phenomenon in which the final secondary structure pattern and 

20 tertiary architecture are determined by global optimization of the hydrophobic 
effect and backbone hydrogen bonding. The native structure is a result of a 
delicate balance of energetic terms; the overall thermodynamic stability of a native 
protein at room temperature is many orders of magnitude smaller than the total 
energy of the system. These features make the computational determination of 

25 protein structure extremely demanding, particularly with regard to the magnitude 
of the conformational searching problem. 
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The helix is a common secondary structural motif found in proteins, and the 
mechanism of helix-coil interconversion is key to understanding the protein- 
folding problem. One very direct approach to the mechanism of helix folding is 
through MD simulation. However, because of energy barriers and random walk, 
5 the helix folding process of peptides is a difficult task for the conventional MD 
simulation. MD simulation has so far been successful to study the relative fast 
process such as the folding dynamics of a pentapeptide in water and the 
conversion of 3 10 helix to a-helix of a 16-residue, alanine-based peptide in water. 
For folding simulations of larger peptides, modified force fields had to be used to 
10 circumvent the energy barrier problem. 

A simple strategy is to improve the computing power, and indeed, as 
computing power increases, more difficult problems can be studied. Another 
strategy is to improve MD simulation efficiency. For example, fast motions limit 
the size of time steps used to solve the equation of motion and make simulations 
15 very expensive. Therefore, when fast motions are not of major interest and are 
separable from other motions, constraints may be used to replace the degrees of 
freedom of fast motions, so that much larger time steps can be used. Other 
strategies to improve MD simulation efficiency include the use of multiple time 
steps to save computational load, the improvement of numerical integration 
20 methods and the use of solvation models to replace explicit solvent molecules. 
However, the improvement in MD simulation efficiency is far from sufficient to 
study slow events. 

Another very productive strategy is to speed up a slow event so that the 
event can take place fast enough to be observed on a time scale accessible by an 
25 MD simulation. Simulated annealing method is a very successful example in this 
category. In this method, high temperature is used to increase the ability of the 
system to overcome energy barriers. The system is then slowly cooled down to 
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low temperatures and simulated at physiologically relevant conditions. A major 
advantage of the simulated annealing MD method is that the system can 
effectively overcome energy barriers and reach different conformational regions 
at high temperature. A drawback of this method is that it needs multiple 
5 simulations to find the global minimum state of the system and to search properly 
the conformational space of the system. 

Therefore, there remains a great need for a computer simulation method 
which can provide a more complete exploration of the conformational space of a 
molecular system. Particularly, there remains a need for a method based on 

10 molecular dynamics which allows for generating trajectories which allow both a 
comprehensive sampling of the conformational space available to a molecular 
system while at the same time allowing the calculation of physically meaningful 
properties by processing the data generated in said simulations. 

SUMMARY OF THE INVENTION 

15 The present invention relates to a method of forming a generated 

conformation of a molecular system. The method comprises generating a self- 
guided molecular dynamics trajectory of the molecular system. The trajectory is 
initiated by assigning to each atom in the molecular system a set of initial 
parameters comprising an initial velocity vector and an initial conformation. The 

20 initial conformation is formed by disposing each atom in the molecular system is 
an initial position. The method further comprises propagating the molecular 
dynamics trajectory according to a computational procedure comprising for each 
atom i in the molecular system: (a) calculating a current force exerted on said atom 
i by deriving an energy function describing interactions between said atom i and 

25 other atoms in said molecular system;(b) determining a guiding force & describing 
an average along a portion of said trajectory of a force exerted on said atom i; (c) 
determining a current position and velocity of said atom i by adjusting a previous 
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position and velocity according to the equation m^ = f|, wherein n^ is the mass of 
atom i, a ; is an acceleration of atom i and f; is a force obtained by adding said 
current force calculated in (a) and said guiding force determined in (b); (d) 
replacing said previous position and velocity of said atom i with said current 
5 position and velocity of said atom i;(e) repeating steps (a) to (d) for one or more 
iterations to generate said trajectory; (f) forming said generated conformation by 
disposing the atoms of said molecule in respective current positions obtained 
during a given iteration and (g) processing said generated conformation to 
determine a structural, energetic, thermodynamic or kinetic property of said 
10 molecular system. 

One embodiment of the invention provides a method of determining said 
guiding force & comprising accumulating a force along a portion of said trajectory 
to obtain a cumulative force, dividing said cumulative force by the length of said 
portion of said trajectory to obtain an average cumulative force and multiplying 
15 said average cumulative force with a parameter X to obtain said guiding force Xg h 
wherein X is not zero. The length of said averaging portion can be defined by a 
number of iterations. The guiding force parameter X can be negative or positive. 
The parameters X and t L are adjusted according to the molecular system to be 
investigated by SGMD. For example, in SGMD simulations involving proteins, 
20 values of X between 0.01 and 10.00 are expected to be suitable for conducting the 
SGMD simulations. Values of X between 0.10 and 1.00 are most preferred. 

In another embodiment of the invention, the cumulative force g ; is obtained 
by adding said current force exerted on said atom i as calculated in step (a) along 
said portion of said trajectory. 
25 In yet another embodiment of the invention, the cumulative force is 

obtained by determining an external force exerted on said atom i obtained by 
deriving an energy function describing the interactions between said atom i and 
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other atoms in said molecular system which are not positioned within three 
covalent bonds or less from said atom i. 

In a further embodiment of the invention, said cumulative force is obtained 
by determining for atom i and each atom j positioned within three covalent bonds 
5 from said atom i, an external force exerted on said atom i or atom j obtained by 
deriving an energy function describing the interactions between said atom i or 
atom j and other atoms in said molecular system which are not positioned within 
three covalent bonds or less from said atom i; for each atom j multiplying said 
external force with a mass ratio m/Mj to obtain a mass weighed external force; 

10 and adding said mass weighed external force obtained for each atom j to said 
external force exerted on said atom i to obtain said cumulative force. 

In yet another embodiment, for each iteration propagating said molecular 
dynamics simulation, the current velocities obtained in step (c) are scaled in such 
a way that the total energy of said molecular system remains constant after the 

15 propagation. 

In yet another embodiment, the invention provides a method of generating 
a tertiary structure of a protein of unknown tertiary structure having a known 
primary structure defined by a sequence of amino-acids forming said protein. 

In another embodiment, the invention provides a method of docking a guest 
20 molecule into a host molecule comprising generating a self-guided molecular 
dynamics simulation, wherein the simulated molecular system comprises said 
guest and host molecules, and wherein the initial conformation describes said 
guest molecule positioned outside said host molecule and said molecular dynamics 
trajectory comprises a generated conformation wherein said guest molecule is 
25 positioned within said host molecule. 

In yet another embodiment, the invention provides a method of generating 
a phase transition of a molecular system by self-guided molecular dynamics 
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simulation according to the invention, wherein said initial conformation describes 
said molecular system in an initial physical state and said generated conformation 
describes said system in a physical state different from said initial physical state. 
In a further embodiment, the invention provides a method of generating a 
5 molecular dynamics trajectory of a molecular system comprising: generating a 
succession of snap shots, each snap shot describing a frame of said molecular 
system, and each snap shot is separated from a previous snap shot in said 
trajectory by a time step St, said trajectory having a total duration of up to tf ina , = 
N6t, wherein N is a predetermined integer greater or equal to 1, and wherein 

10 generating said trajectory comprises (a) initiating said molecular dynamics 
trajectory by assigning to each atom in said molecular system a set of initial 
parameters comprising an initial velocity vector and an initial frame wherein each 
atom in said molecular system is disposed in an initial position; (b) propagating 
said molecular dynamics trajectory by a computational procedure comprising for 

15 each atom i by: (i) calculating a current force exerted on said atom i by deriving 
an energy function describing interactions between said atom i and other atoms 
in said molecular system; (ii) determining a guiding force g ; describing an average 
along a portion of said trajectory of a force exerted on said atom i; (iii) 
determining a current position and velocity of said atom i by adjusting said 

20 previous position and velocity according to the equation = fj, wherein rrij is the 
mass of atom i, is an acceleration of atom i and f { is a force obtained by adding 
said current force calculated in (i) and said guiding force determined in (ii); (iv) 
replacing said previous position and velocity of said atom i with said current 
position and velocity of said atom i;(c) repeating step (b) for up to N-l iterations 

25 to generate said trajectory and processing said trajectory to determine a structural, 
energetic, thermodynamic or kinetic property of said molecular system. 
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Yet another embodiment of the invention provides a method of generating 
a self-guided molecular dynamics treaectory, wherein a guiding force gj(tp) at a 
snap shot tp = pSt, wherein p is <N, is equal to a guiding force gi (s) (tp), wherein 
gi (a) (tp) is determined according to equation (1) as follows: 

5 gi <s) (t p ) = (1 - 1/L) gj (s >(tp -6t) + (l/L)m£ jeSi + Xg^% - St))/]^} 

wherein gj <s) (tp) is the guiding force exerted on atom i at a snapshot tp, gj (s) (tp -6t) 
is the guiding force exerted on atom i at a snapshot (tp -St), L is a length of a 
portion of said trajectory employed in computing said cumulative guiding force, 
m { , atomic mass of atoms i, respectively, S< is a group formed by all atoms in said 

10 molecular system which are linked to atom i through three covalent bonds or less, 
Mj is the total mass of atoms in S J5 is a force exerted on atom j by all atoms 
in said molecular system which are not in group Sj; X is a predetermined guiding 
force multiplication factor. 

In a further embodiment, the invention provides a method of generating a 

1 5 self-guided molecular dynamics trajectory, wherein determining a current position 
r s (t p + St) of an atom i at snap shot p +1 , comprises adjusting a position of said 
atom i according to equation (2) as follows: 

r(t p +6t) = rCg + Vi (tp +8t/2)8t 
wherein r(tp) is the position of atom i at a snap shot immediately preceding the 

20 snap shot tp + 6t and Vj(tp +St/2) is the velocity of atom i at half a time step St 
separating snap shots p and p +1, wherein v ; (tp +5t/2) is calculated according to 
equation (3) as follows: 

Vj (tp +6t/2) = Vj (tp -St/2) + (f s (tp) + Ag^g^t/m,. 
In yet a further embodiment, the invention provides a method of predicting 

25 an unknown folded structure of a polypeptide. The method comprises generating 
a self-guided molecular dynamics trajectory according to the invention, wherein 
said molecular system comprises said polypeptide and said initial frame is defined 
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by disposing said polypeptide in a random conformation; calculating a 
thermodynamic property along said trajectory; and extracting from said trajectory 
a selected frame associated with a predetermined criterion based on said 
thermodynamic property and forming a predicted conformation by assigning to 
5 each atom in said polypeptide a corresponding position in said selected frame. 
Said molecular system being preferably immersed in a box of water molecules. 

In yet another embodiment, the invention provide a method for the 
refinement of experimental determined structures. With NMR determined 
constraints for x-ray determined low resolution conformation, this invention can 
1 0 generate high resolution structures. 

RRTFF DESCRIPTION OF THE FIGURES 
Figures 1 (a)-(c) show an all-atom model of the alanine dipeptide, bonded 
substructures of atom CT and bonded substructure of atom C u respectively; 

Figure 2 shows <J)-i|f dihedral angle distribution (contour map) of the 
15 alanine dipeptideobtained from a 10,000 ps MD simulation in vacuum at 700 K 
using the AMBER force field; 

Figure^shows (J>-i|i dihedral angle distribution (contour map) of the 
alanine dipeptide obtained from a 10,000 ps self-guided MD simulation in vacuum 
at 700 K using the AMBER force field. In the simulation, k=0. 1 and tj=0.2 ps; 
20 Figure 4 shows (|>-t|f dihedral angle distribution of the alanine dipeptide 

obtained from a 5,000 ps MD simulation in explicit water at 300 K using the 
AMBER force field; 

Figure^shows (J)-i|f dihedral angle distribution of the alanine dipeptide 
obtained from a 5,000 ps self-guided MD simulation in explicit water at 300 K 
25 using the AMBER force field. In the simulation, X=0.5 and tf= 2 ps; 

Figures 6(a)jnd6(b) are time profiles of (j) and ijr, respectively during MD 
and SGMD simulations of the alanine dipeptide obtained from 5,000 ps MD or 
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self-guided MD simulations in explicit water at 300 K using the AMBER force 
field; 

Figure 7 shows snapshots of the conformations of the 16-residue peptide 
obtained from the 10,000 ps conventional MD simulation in vacuum at 300 K 
5 using the AMBER force field, starting from the fully extended conformation. For 
clarity, hydrogen atoms were not shown; 

Figure 8 shows snapshots of the conformations of the 16-residue peptide 
obtained from a 10,000 ps self-guided MD simulation in vacuum at 300 K using 
the AMBER force field, starting from the fully extended conformation. For clarity, 
10 hydrogen atoms were not shown. In the simulation, A=0.1 and ti=0.2 ps; 

Figure 9 shows snapshots of the conformations of the 16-residue peptide 
obtained from a 10,000 ps self-guided MD simulation in vacuum at 300 K using 
the AMBER force field, starting from a local energy minimum conformation 
obtained from the MD simulation in vacuum at 300 K, For clarity, hydrogen atoms 
15 were not shown. In the simulation, A=0. 1 and ti=0,2 ps; 

Figure 10 shows the potential energies of the 16-residue peptide during 
simulations in vacuum at 300 K using the AMBER force field. The points shown 
in this figure are sub-averages over a period of 1 00 ps. The dashed line is the result 
obtained from the conventional MD simulation; the dotted line is the result 
20 obtained from the self-guided MD simulation starting from a fully extended 
conformation; and the solid line is the result obtained from the self-guided MD 
simulation starting from "the trapped structure" obtained at the end of the 10,000 
ps conventional MD simulation; 

FigureJXshows a system of the 16-residue peptide in 1879 explicit waters. 
25 Cubic periodic boundary condition was applied in the simulations; 
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Figure 12 contains snapshots of the conformations of the 16-residue 
synthetic peptide during a 200 ps conventional MD simulation in explicit water 
at 300 K using the CHARMM force field; 

Figure 13_ contains snapshots of the conformations of the 16-residue 
5 synthetic peptide during a 200 ps self-guided MD simulation in explicit water at 
300 K using the CHARMM force field. For clarity, hydrogen atoms were not 
shown. In the simulation, A=0.5 and t,= 2 ps; 

Figure 14 contains snapshots of the conformations of the 16-residue 
synthetic peptideliuring a 200 ps self-guided MD simulation in explicit water at 
1 0 300 K using the AMBER force field. For clarity, hydrogen atoms were not shown. 
In the simulation, X=0.5 and t,= 2 ps; 

Figure 15(a) shows the time profile of a gyration radius along a 10 ns 
conventional MD simulation of a 16-mer peptide; 

Figure 15(b) shows the time profile of a gyration radius along a 10 ns 
1 5 SGMD simulation of a 1 6-mer peptide; 

Figure 16 shows Ramachandran plot of the 1 6-mer peptide in 10 ns SGMD 
simulation; " 

Figure 17 shows the time profile of the helix content along a 10 ns SGMD 
simulation of a 16-mer peptide; 
20 Figure JjSjjhows the total number of helical residues for each conformation 

in a 1 0 ns SGMD simulation of a 1 6-mer peptide; 

Figure 19 shows population of 9 major conformational clusters; 
Figure 20 shows the time profile of secondary structures along the first 400 
ps of a 10 ns SGMD simulation of a 16-mer peptide; 
25 Figure 2 1 shows a proposed helix forming initiation mechanism; 
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Figure 22 shows the time profile of a gyration radius and the time profile 
of the helix content along a 10 ns SGMD simulation of a 16-mer peptide 
performed with A.=0.3 and t,=2 p; 

Figure 23^shows the structure of YPGDV; 
5 Figure 24 shows a (f>-i|; dihedral angle distribution of YPGDV obtained 

from a 2 ns conventional MD simulation; 

Figure 25 shows a <|>-i|f dihedral angle distribution of YPGDV obtained 
from a 2 ns SGMD simulation; 

Figure 26 shows the center conformation of nine major clusters observed 
10 during 2 ns SGMD simulations with X=Q.Q (conventional MD), 0.1, 0.2, 0.3,0.4 
and 0.5; 

Figure 27 shows conformational transitions among the nine clusters 
described in Figure 25; 

Figure 28 shows the argon system with tetragonal periodic boundary 
15 conditions; 

Figure 29 shows the time profile of the potential energy along a SGMD 
trajectory of 500 argon atoms; 

Figure 30 is the initial configuration of the guest and host molecular 
system in a box~of waters (For clarity, water molecules are not shown); and 
20 Figure 31 is the predicted complex structure in comparison with the X-ray 

determined structure. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
The present invention provides a novel simulation method for generating 
trajectories which describe the evolution of a molecular system during a 
25 determined period of time. The method is based on the molecular dynamics 
technique and incorporates simulation parameters which significantly enhance the 
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conformational sampling capabilities of molecular-dynamics simulation 
technologies. 

Specifically, the method of the invention includes calculating a guiding 
force and implementing such guiding force in the generation of a molecular 
5 dynamics trajectory. The guiding force is accumulated through the ongoing 
simulation. The guiding force enhances the systematic motions taking place during 
the simulation. The systematic motions are defined as the motions resulting form 
the average forces exerted on the simulated system along the molecular dynamics 
trajectory. That is, by averaging the forces exerted on the system during the 
10 simulation, short range random motions are cancelled out and the forces related 
to long range movements are enhanced. 

The guiding force implemented in the method of the invention is obtained 
by averaging the classical force exerted on a given atom over a given portion of 
the generated trajectory. The averaged force is then employed in adjusting the 
1 5 position of the atom from one snap shot to the next. That is, at each time step, the 
position of the atom is adjusted according to a force which includes both the force 
generally employed in conventional molecular dynamics simulations and the 
guiding force, which is obtained by averaging the classical force over a portion of 
the trajectory being generated immediately before the current time step. Since the 
20 guiding force is calculated based on data accumulated during the simulation of the 
trajectory being generated, the method is referred to as Self-Guided Molecular 
Dynamics or SGMD. A detailed description of the basic principles and the 
computational technologies involved in its implementation can be found in "Self- 
Guided Molecular Dynamics Simulation for Efficient Conformational Search, 
25 Xiongwu Wu and Shaomeng Wang, The Journal of Physical Chemistry B, Vol. 
102, Number 37, pages 7238-7250. The contents of the article and the references 
cited therein are herby incorporated by reference in their entirety. 
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The guiding force introduced into the molecular dynamics simulations 
according to the invention can be controlled by several parameters. Like the 
classical force employed in conventional MD and in SGMD, the guiding force is 
determined by the force field employed in generating the trajectory. In addition, 
5 the invention provides two main parameters for controlling the degree of 
enhancement of the systematic motions during an SGMD simulation. Those 
parameters are the length of the trajectory portion which is employed in computing 
the average force. The length of trajectory can either be express in terms of the 
number of time steps L included in calculating the average force exerted on an 

10 atom or the corresponding simulation time L6t, wherein 6t denotes the time step 
employed in propagating the equations of motion of the molecular system. 

The method of the invention provides important advantages. For example, 
a trajectory generated through SGMD provides a significantly enhanced search of 
the conformational space available to a molecular system while allowing 

1 5 providing physically significant structural, thermodynamics and kinetic data. The 
combination of enhanced conformational sampling and conservation of the 
physical relevance of the data obtained through SGMD simulations opens many 
of the applications that have been inaccessible to conventional MD techniques as 
well as some the available modified techniques. 

20 For example, the SGMD technique can be advantageously employed in the 

investigation of protein folding and the prediction of the tertiary structure of a 
protein of known amino acid sequence and unknown structure. In biological 
systems, protein folding generally takes milliseconds to hours, far beyond the 
reach of conventional MD simulations with current computer power. 

25 Homology modeling is a very popular and powerful means in predicting 3D 

structures of proteins of unknown tertiary structure based on the known 3D 
structure of a homologous protein, which is employed as the template. The 
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accuracy of this technique is intimately related to degree of homology between the 
amino acid sequences of the protein of unknown tertiary structure and the protein 
serving as a template. When the degree of homology is not sufficiently high, the 
refinement on the template is necessary to take into account the effect of the 
5 sequence mismatches on the structure of the protein. This requires refinement 
techniques which can comprehensively explore the conformational space available 
to the template, thereby enhancing the possibility of identifying the structure 
corresponding to the global energy minimum. As described in relation to the 
general protein folding applications of the SGMD method, the technique can be 

10 equally advantageously employed in homology based protein structure 
determination techniques by perform simulations to fold the regions that have very 
unreliable predictions from homology modeling e.g., regions with a very low 
degree of homology. 

Another application which can be advantageously explored by the SGMD 

15 simulation technique of the invention relates to ligand binding and molecular 
docking. These aspects are subjects of great practical application in design, 
discovery and development of new therapeutic agents in pharmaceutical industry. 
The information how a drug binds and interacts with its target molecule such as 
an enzyme or a receptor is critical to the success of rational drug design. NMR 

20 and X-ray diffraction techniques are experimental means of providing such 
information but such methods are very expensive and painstaking. SGMD allows 
major improvements in the study of the host-guest dynamics. In particular, the 
method of the invention allows more flexibility to be introduced into the simulated 
molecular system by incorporating the degrees of freedom of both the guest 

25 (ligand) and host (receptor) thereby dramatically increasing the structural 
diversity of the possible host-guest associations. The advantages provided by 
SGMD in the investigation of guest-host dynamics and energetics will open the 
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doors to important application highly important fields, such as rational drug 
design. 

Another advantage of the enhanced conformational sampling provided by 
the method of the invention related to the ability to describe complicated events, 
5 such as phase transition. Phase transition is an important process with 
fundamental importance in physics, chemistry and biology. Because phase 
transition such as crystallization of liquids is a process associated with significant 
free-energy barriers, conventional MD simulation has very limited application in 
the applied to simulate phase transition events, such as the crystallization process 

1 0 of simple liquids. In this regard, the SGMD simulation techniques advantageously 
provides a direct approach to investigate phase transition processes, and to predict 
phase states and structure of systems of interest. 

Other avenues for advantageous implementation of the SGMD technique 
provided by the invention include multiple dimensional NMR, X-ray diffraction 

1 5 techniques and other molecular structure determination techniques based on the 
combination of simulation and spectroscopic data. 

In the context of conformational searching, the motions of a system may 
be divided into random and systematic motions. While the random motions are 
important to the sampling of detailed conformational distributions of the system, 

20 the systematic motions of the system are more relevant to the conformational 
search of a large conformational space. 

For a many-body system, the systematic motions are normally much slower 
than the random motions because of energy barriers and random walk. Energy 
barriers are created by non-uniform energy surfaces and random walk is due to the 

25 frequent collisions between atoms. One example of random walk is the diffusion 
of solute molecules in solution. The constant collisions of solute molecules with 
surrounding solvent molecules make the solute molecules go through a very 
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complicated and detoured path, which looks like a random walk. Consequently, 
the diffusion speed of a solute is therefore far slower than its actual thermal 
motion. The slow systematic motions are the bottleneck that limits the efficiency 
of a conformational search during an MD simulation. Therefore, if we can 
5 somehow specifically enhance the systematic motions, the efficiency of 
conformational searching may be improved. 

The present invention relates to a new MD simulation method, termed the 
self-guided MD simulation method, with the primary goal of improving the 
conformational searching efficiency. In this method, an extra guiding force is 

10 introduced, in addition to the instantaneous force derived from the force field, into 
the equation of motion to speed up the systematic motion of the system. 
THE SELF-GUIDED MD SIMULATION METHOD 

Conformational searching is a basic approach in theoretical and 
computational chemistry and molecular dynamics [MD] simulation and is a 

15 powerful computational tool to study structural, thermodynamic and kinetic 
properties of molecular systems. This new MD simulation method was based 
upon a new equation of motion in which a guiding force was introduced, in 
addition to the instantaneous force, to accelerate the systematic motion. This 
guiding force may be calculated as a time average of the instantaneous force from 

20 the trajectory obtained from the very same MD simulation. Bonded substructures 
were defined for molecular systems in order to enhance effectively both the inter- 
and intra-molecular systematic motions. 

The guiding effect can be adjusted through two parameters A. and t,, a larger 
value of A will speed up the systematic conformational changes and improve the 

25 overall conformational searching efficiency. However, a too large value of X 
might result in more alteration in ensemble averages and the conformational 
distribution. A proper value of X will significantly improve the overall 
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conformational searching efficiency and at the same time, will not significantly 
alter the thermodynamic properties of the system. Since t, represents the averaging 
time, a relative small value should be chosen for systems with fast conformational 
changes and a relatively large value should be chosen for systems with slow 
5 conformational changes to achieve the best guiding effect. 
Equation of motion with an enhanced s ystematic motion 

The equation of motion used in the conventional MD simulation under the 
Newtonian law is shown in eq. (1). 



10 the force acting on atom i. 

In a many-body system, most of motions change their direction frequently 
due to collisions and restraints, such as bond stretching and angle bending. The 
systematic motion of atom i may be described by its average motion during a 
certain time period. 

1 5 Wherein Vj is the average velocity for atom i from time t - 1, to time t, r^t) 

is the coordinate of atom i at time t, v t (r) is the velocity of atom i at time x. 

According to eq. (1), the equation of the systematic motion can be 
expressed by eq. (3) as follows: 



m a ~f 

i i J i 



(1) 



Wherein n^ is the mass of atom i, a, is the acceleration of atom i and f k is 




(2) 



Wherein a 4 is the (average) acceleration of the systematic motion of atom 
20 i at time t, a^T) is the acceleration of atom i at time t and f x is the average force. 
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Eq. (3) clearly shows that the acceleration of the systematic motion is governed 
by the average force. 

Therefore, the introduction of the average force into the equation of motion 
as a guiding force can in principle alter the systematic motion of the system. In 
5 order to control the effect of introducing a guiding force into the equation of 
motion of simulated system, the invention provides a guiding factor X which is 
introduced in eq. (3). With a guiding factor in eq. (3) and combining it with eq. 
(1), a new equation of motion can be obtained, as shown in eq. (4). 

Combining the new equation (3) and equation (4) provides equation (4) 
10 which describes an enhanced systematic motion equation: 

m A . =m fa . + ka~) (3) 

Wherein A t is the acceleration of atom i with altered systematic motion as 
compared to a; in eq. (1), X is a guiding factor, which can be any value. Since the 
acceleration of the systematic (average) motion is governed by the average force, 
a positive value of X should speed up the systematic motion and a negative value 
15 of A. should slow down the systematic motion. A value of 0 for X yield an eq. 
which is identical to the classical Newtonian equation of motion, described in eq. 
(1). 

The implementation of eq. (4) with X>0 provides an MD simulation with 
enhanced systematic motion. However, to perform an MD simulation precisely 
20 according to eq. (4), one needs to perform an MD simulation according to eq. (1 ) 
to obtain the average force. This needs to be done at every time step, which 
essentially makes an MD simulation according to eq. (4) impractical. To overcome 
this difficulty, eq. (4) was modified to obtain eq. (5) as follows: 

m / i r m p^^ m 7-i%^i (4) 
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Where g, is calculated from eq. (6) as follows: 

s~ m Jrj^jf^) dx (5) 

While simulations generated according to equation (5) provide trajectories 
with enhanced systematic motion, the enhanced systematic motion includes inter- 
and intra-molecular systematic motions as discussed above, most of atomic 
5 motions related to intra-molecular interaction or bonded interactions are fast, while 
a systematic motion generally is very slow. During a conformational change, most 
of the internal coordinates, such as bond lengths, bond angles, and some of 
dihedral angles, just oscillate around their equilibrium values. Even though most 
of these high frequent motions do not contribute to a systematic conformational 

10 change, they create significant noise when using a time average velocity to 
describe the systematic motion. Hence, a guiding force estimated based upon the 
total force in a molecular system may be not very effective to accelerate the 
systematic motion. In one embodiment, the present invention provides bonded 
substructures to describe the local structural rigidity of a molecule and to derive 

15 a guiding force to enhance effectively the systematic motion. 

For each atom, i, a bonded substructure, S 4 is defined as a part of the 
molecule which includes atom i itself and all other atoms having a non-zero 
bonded interaction with atom i, i.e., atoms interacting with atom i, through either 
a bond, a bond angle or a dihedral angle. In other words, all atoms in a molecule 

20 to atom i through three or fewer covalent bonds belong to bonded substructure Sj. 
Atom i is called the central atom of substructure Sj. Each atom has a bonded 
substructure and two bonded substructures in a molecule can be partially or even 
totally overlapping as long as the central atoms are different. An example of 
substructures according to the invention is depicted in Figure 1. Figures l(a)-(c) 
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depict an all-atom model for an alanine dipeptide and two bonded substructures 
for two atoms in the molecule. 

Within the framework of the bonded substructures provided by the 
invention, a conformational change in a molecule can be viewed as the external 
5 motion of bonded substructures on a particular bonded substructure S, and the 
internal motion within said bonded substructures The systematic 

conformational change of a molecule can be described by the external systematic 
motion of bonded substructures. Since the latter, the external motion is important 
for the systematic conformational change since selective enhancement of the 

10 external systematic motion of bonded substructures effectively improve the 
conformational searching efficiency for molecular systems in an MD simulation 
according to the invention. 

To accelerate the external systematic motion of the bonded substructures, 
the invention provides a specific guiding force gi (s) , which applied to atom i. the 

15 specific guiding force is incorporated in the equation of motion of atom i 
according to equation (7) as follows: Analogue to eq. (5), the equation of motion 
for atom i with a guiding force g/ s) can be written in equation (7) as follows: 

ma r f+kgf (6) 

It should be noted that a ; is different from that in the classical Newtonian 
equation of motion expressed in eq (1). As well, since the guiding force gj (s) is 
20 used to accelerate the systematic motion of bonded substructures, gj (s) can no 
longer be calculated according to eq. (6), the mathematical machinery associated 
with the motion of bonded substructures is described below. 

For atom i, only the interaction force between atom i and atoms outside S; 
is included in calculating the external force exerted on bonded substructure Sj. 
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This interaction force is denoted as /j (s) , which can be calculated according to eq. 
(8) as follows: 

Wherein /y is the force exerted by atom i on between atoms i and j . The 
summation is over all the atoms that do not belong to bonded substructure S f 
5 0* *Si). 

With an extra guiding force A.g, (s) , the contribution of the combined force on 
atom i to the acceleration of S 4 is as follows: 

Where M, is the total mass of S i5 a^ is the acceleration of S x contributed from 
the combined force on atom i. Such acceleration will apply to all the atoms in Sj. 
10 If we neglect the rotation of the bonded substructure, for atom j in S i5 the 
acceleration of atom j resulted from the above motion of Sj in eq. (9) is as follows: 

Because atom j may belong to multiple bonded substructures, the total 
acceleration of atom j, a/ S) resulting from the external motion exerted on these 
bonded substructures is the summation of the eternal force over all the bonded 
15 substructures to which atoms j belongs. For each bonded substructure Sj to which 
atom j belongs, its central atom i should also belong to S, (j e S; ). Hence, we 
have: 
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•r-$v$-sfr*r> am 

Because g/ s) is introduced to enhance specifically the systematic motion of 
atom i resulted from the external motion of bonded substructures, it can be thus 
calculated from eq. (12). 




Wherein (j e S ; ) indicates that atom j belongs to substructure S i5 aj (S) is the 
5 time average of the total acceleration of atom i associated with the systematic 
motion exerted on the bonded substructures Sj. 

Hence, a self-guided MD simulation with an enhanced (altered) systematic 
motion of bonded substructures is performed according to eq. (7). Eq. (7) is called 
the equation of motion with an enhanced systematic motion for molecular systems. 

1 0 The force ^ on atom i is calculated exactly in the same manner as in a conventional 
MD. The guiding force on atom i is calculated according to eq. (12). It should be 
noted that for mono-atomic molecular systems, the bonded substructure becomes 
a single atom. In this case, eq. (12) reduces to eq. (6) and eq. (7) reduces to eq. (5). 
Estimation of the guiding force 

15 As discussed above, the estimation of a guiding force according to equation 

(6) can be time consuming and thus may offset the numerical advantages 
associated with the enhanced systematic motion procedure. Therefore, the 
invention provides a technique for minimizing the computational cost of 
determining the guiding force g { (t). Computational efficiency for the calculation 
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of the time average for any property P, can be achieved through implementation 
of eq. (13) as follows: 



t-bt 



_ 5, ' 5 f ^ 



(12) 



10 



Wherein P(t)is the time average of property P at time t, P(t) is the value of 
P at time x, 6t is the size of one MD time step, and t, is the averaging time. 

Eq. (13) shows that the time average of P at the current time step, P(t) can 
be estimated from the time average of the previous time step, P(t - St) and the 
value of P at the current step P(t). This approximation can effectively save both 
memory and cpu time. 

Accordingly, the guiding force in eq. (12) can be approximately calculated 
according to eq. (14). 



1-* 



m 



(13) 



15 



t, should be larger than or equal to the size of one MD time step, St. 
The self-guided MD s i mulation a l gorithm 

A self-guided MD simulation is performed by numerically solving eq. (7). 
In our implementation, the leap-frog algorithm was employed, as shown in eq. 
(15) and (16) as follows: 



5/ 



r.(f+5/)=r.(0+vI f+— 



6t 



(14) 
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(15) 



Wherein v^t) is the velocity of atom i at time t, fj(t) is the force on atom i at 
time t, Xgj (S) (t) is the guiding force on atom i at time t and X is the guiding factor. 

Because of the extra guiding forces, now atoms move along a trajectory 
departing from the classical Newtonian dynamics. For micro-canonical ensemble 
5 simulation, the total energy of the system should be conserved. This can be 
accomplished by scaling the velocities of atoms in the system after each time step. 
The velocities at the next half time step, t + St/2, are scaled according to eq. (17). 



Wherein Xe is called the energy conservation scaling factor, which was introduced 
to make the total energy conserved during each time step. 




(16) 



10 



Hence, with the scaling factor x E , the kinetic energy at t + 6t/2 is: 




According to eq. (7), the kinetic energy change at each time step without the 
scaling factor % E can be expressed as: 
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t-M = £mv.(,) 5v.= 
£ mv.(,)a.(,)6r=£ (f +Ag. (S) )v.(,)5, 



(18) 



■a 

fif 

: is? 
= IT? 



While the potential energy change is: 

6£ = -J^f.fir. = -£f.v.(r)6f 
p *— ' i i ^— ' i i v y 



(19) 



With the scaling factor in eq. (17), the total energy is thus conserved Hence, 
the kinetic energy change should be equal to the negative potential energy change. 



(20) 



From eq. (19) and eq. (21), the following equation can be derived to 
5 calculate the scaling factor % E : 





f 6,1 
t- — 

I 2 J 






f 



1/2 



(21) 
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Wherein v,(t) is calculated by 



5f 



(22) 



For constant temperature simulation, Berendsen et al. proposed a velocity 
scaling method 21 that uses a velocity rescaling factor Xb as follows: 











I 'r 


IT )) 







(23) 



to maintain a constant temperature T. Wherein T is the current kinetic temperature 
5 and tr is a preset time constant. Here, the energy conservation scaling factor % B 
and Berendsen' s velocity rescaling factor Xb to obtain a velocity scaling factor Xt> 
for constant temperature simulation based on the self-guided MD method of the 
invention as follows: 

X T = X B X E (24) 

Alternative to the scaling method described above, a constant temperature 
10 simulation using the self-guided MD method can also be accomplished using the 
constraint method to keep the kinetic energy constant. 22 " 25 The constrained 
equation of the self-guided motion can be written as: 

« A = f.+Ag. (S) - 5v, (25) 



WO 00/10067 



PCT/US99/18302 



-32- 

Wherein £ is a kind of "friction coefficient", which varies to constrain the kinetic 
energy to a constant value. This constraint method has also been implemented but 
was not used in the simulations presented in this paper. Thus, the detail algorithm 
of its implementation is not discussed here. 
5 Based on the above mathematical framework for employing the "leap-frog" 

algorithms generating a self-guided MD simulation according to the invention, a 
computer simulation is performed as follows: 

(a) Provide a set of initial conditions for starting the simulation. The 
initial conditions include for each atomists of simulated molecular 

10 system an initial position at t=0, an initial velocity v(t==0), a time 

step 5t, a guiding factor k and a guiding force averaging time, t, and 
an initial guiding force gi 0) (0). 

(b) At time step t, calculate the force on each atom, fj(t). This 
calculation is exactly the same as that in a conventional MD 

15 simulation. Calculate the guiding force according to (27) which is 

an approximation of eq. 14. 



g ; (S) (0= 



g / 8 >(r-60+— «X J 77 



(c) Forward the velocities to the next half time step, according to eq. 
(15). For constant energy simulations, the velocities are scaled 
according to eq. (17). For constant temperature simulations, the 

20 velocities are scaled by the scaling factor calculated according to eq. 

(25). 

(d) Forward the positions to the next time step, according to eq. ( 1 6). If 
bond lengths or other internal coordinates are to be fixed, the 
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SHAKE algorithm developed by Ryckaert et aL 6 is applied to get 
the constrained positions, 
(e) Repeat steps (b) to (d) until the end of the simulation. 

EXAMPLES 

5 EXAMPLE 1 

Simulation Conditions 

In this example, the self-guided MD simulation method was applied to two 
peptides, an alanine dipeptide and a neutral, 16-residue synthetic peptide. The 
structure of the alanine dipeptide is shown in Figure 1(a)- The alanine dipeptide 

10 system has been extensively studied by MD simulations and other computational 
methods. The neutral, 16-residue synthetic peptide has a sequence of CH 3 CO- 
AAQAAAAQAAAAQAAY-NH 2 , where A refers to an alanine, Q refers to a 
glutamine, and Y refers to a tyrosine. This synthetic peptide was determined 
experimentally using the circular dichroism (CD) spectrum to have ca. 50% helix 

1 5 secondary structure in aqueous solution. All-atom models were used to represent 
these peptides. For the purpose of comparison, we also carried out comparative 
simulations for these two peptides using the conventional MD method. Identical 
conditions were used when carrying out the conventional MD and the self-guided 
MD simulations except in the conventional MD simulations, the guiding factor X 

20 was set to be 0. 

The self-guided MD simulation algorithm was implemented in both the 
CHARMM and the AMBER programs. The MD simulations were performed 
using the CHARMM package with its version 22 force field parameters or the 
AMBER (version 4.0) with its all-atom force field parameters. Both sets of force 

25 field parameters include bond length, bond angle, dihedral angel and improper 
dihedral angle interactions for bonded interaction, Lennard- Jones 6-12 potential 
and electrostatic interaction for non-bonded interaction. The AMBER force field 
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also includes an explicit hydrogen bond potential. In all simulations, a constant 
dielectric constant of 1 was used. All results obtained using the AMBER force 
field are reported below. For brevity, the results obtained using the CHARMM 
force field are reported only for the simulation of the 1 6-residue peptide in explicit 
5 water at 300 K using the conventional MD and the self-guided MD simulation 
methods. 

Fully extended conformations were used as the initial conformation for these 
peptides. The aqueous solution for a peptide was built by immersing the peptide 
into an equilibrated water box and deleting overlapping water molecules found 
10 within 2.45 A of the peptide. The TIP3P water model 41 was used to represent 
water molecules. 

For simulations conducted in vacuum i.e., with no explicit water molecules 
in the molecular system, non-bonded interactions were calculated with no cutoff. 
For simulations in explicit water, non-bonded interactions were calculated with a 

15 cutoff value of 14 A and a switching function was applied from 8 to 12 A to 
smooth the interaction change across the cutoff Cubic periodic boundary 
conditions were applied to molecular systems including explicit water molecules. 
A time step 6t of 0.002 ps was used in all simulations. A neighbor list was created 
for interaction calculation and updated every 10 MD time steps. The SHAKE 

20 algorithm was applied to constrain all bonds. All simulations were conducted at 
constant temperature. The velocity scaling method was used to maintain a constant 
temperature and tp in eq. (24) was set to be 1 ps in all simulations. 
(A) The Dipeptide System 

1. Alanine dipeptide simulations in vacuum 

25 The alanine dipeptide has a total of 22 atoms with two peptide bonds (Figure 

1(a)) . Two dihedral angles, 4> for C r N r C a -C2 and i|/ for N r C a -C 2 -N 2 , in this 
molecule can freely rotate and the rotation of these two torsion angles results in 
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different conformations for the molecule. The conformation of this molecule can 
be described by the values of <j> and i|r. Conformations with different (f> and i|r 
angles have different potential energy because of the interactions between atoms 
in the molecule. Therefore, the conformational distribution is not uniform 
5 throughout the <|>-t|j space. The (f>-i|/ distribution is characteristic for each amino 
acid and this distribution can be shown by a Ramachandran plot, which depicts the 
conformational densities in a 2D plot with 4> being the X-axis and ij; being the Y- 
axis. 

Two comparative 10,000 ps simulations were performed to the peptide at 
1 0 700 K with the AMBER force field. For the purpose of comparison, a temperature 
of 700 K was chosen for both simulations because it was found that at lower 
temperatures such as 300 K, a 10,000 ps simulation using the conventional MD 
method does not provide proper sampling of the entire conformational space and 
a much longer simulation time would be required to obtain a proper sampling. The 
1 5 parameters for the self-guided MD simulation were chosen as A,=0. 1 and t,=0.2 ps. 
Trajectories were recorded every 0.2 ps in both simulations for analysis. 

First, the ensemble average properties obtained from these two simulations 
were evaluated. The ensemble averages of temperature, the fluctuation of 
temperature, the total energy, the fluctuation of total energy, the potential energy, 
20 the fluctuation of the potential energy, the kinetic energy and fluctuation of kinetic 
energy obtained from both simulations are provided in Table I. As can be seen, the 
values obtained from the conventional MD and the self-guided MD methods are 
essentially the same, indicating the self-guiding force does not change 
significantly these ensemble average properties. 
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Next the conformational distributions in the c[)-\Jj space (the Ramachandran 
plot) obtained from conventional MD and the self-guided MD simulations are 
plotted in Figure 2 and Figure 3, respectively. Both maps reveal that there are four 
high-density regions, which are labeled as regions I to IV in Figures 2 and 3, 
5 respectively. These four high-density regions correspond to four local minima for 
the alanine dipeptide. In both contour maps, the <j> and t|j values for these four 
local minima are (-75,75), (55, 35), (-65,-45), and (65,-75), respectively. 
According to the size of each region, region I was defined as the area within 50 
degrees of peak I, region II as the area of 10 degrees within peak II, region III as 

10 the area of 10 degrees within peak III, and region IV as the area of 30 degrees 
within peak IV, respectively. No significant difference was found, when these two 
contour maps were superimposed. A small difference was observed in region II, 
where the self-guided MD simulation results in a sharper peak than the MD 
simulation. These results demonstrate that the conventional MD and the self- 

15 guided MD simulations yield almost identical results for the alanine dipeptide 
system in both the ensemble averages and the conformational distribution of the 
system. 

The conformational searching efficiency of these two simulation methods 
was then compared. Four local minima were found in the conformational space 

20 as shown in Figures 2 and 3, Energy barriers exist between these four local 
minima, as evident from the low conformational density in the contour maps. The 
system needs to overcome these energy barriers in order to transfer from one 
region to another and to sample properly the cJ>-t|t conformational space during a 
simulation. Therefore, the frequency of the transitions between local minima is a 

25 measure of the ability to overcome the energy barriers as well as the 
conformational searching frequency. The transitions between local minima were 
counted for both simulations and the results were summarized in Table II. It is 
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clear that more transitions between the minima occurred in the self-guided MD 
simulation than in the conventional MD simulation. These results suggest that an 
improved conformational searching efficiency was achieved using the self-guided 
MD simulation method. 
5 2. Alanine dipeptide simulations in aqueous solution 

The alanine dipeptide was solvated using 212 TIP3P water molecules, 
enclosed in a cubic box. The box size was set to make the density of the system 
to be 1.0 g/cm 3 . Two comparative 5,000 ps simulations were carried out at 300 K 
for the alanine dipeptide in aqueous solution using the conventional MD and the 

10 self- guided MD methods with the AMBER force field- Systematic motion in 
explicit water is generally much slower than that in vacuum and for this reason, 
a relatively bigger X value of 0.5 was employed to increase the guiding effect of 
the self-guiding force and a relatively longer averaging time t, of 2 ps was chosen 
for the self-guided MD simulation. Trajectories were recorded every 2 ps in both 

15 simulations for analysis. 

The same ensemble average properties examined for the simulations in 
vacuum were evaluated for the solvated dipeptide system and the results are also 
provided in Table I. Again, the values obtained from the two simulations are 
essentially the same, indicating that the self-guiding force does not change 

20 significantly the ensemble average properties for the system in aqueous solution. 

The conformational distributions in the 4>-i|j space obtained from these two 
simulations are plotted in Figures 4 and 5, respectively. As can be seen in the 
figures, the conformational distribution obtained for these two simulations is 
similar. Region III, which is located at the same position, around (-60,-60) on 

25 both maps, has the highest conformational density. Two other regions with high 
conformational density, I and II, are also located at the same position on both 
maps. Region IV, which can be easily identified as a local minimum in Figure 5, 
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was only accessed a few times during the conventional MD simulation, as shown 
in Figure 4. The major difference between these two maps is that the 
conformations are more concentrated around the peaks in Figure 5 than in Figure 
4 and this is particularly obvious for conformations in regions I and II. This 
5 suggests that the self-guided MD simulation accesses more often the low energy 
conformations than the conventional MD simulation. However, it is important to 
note that the positions of the global minimum and other local minima are the same 
in both maps. 

The conformational searching efficiency for the simulations in aqueous 
10 solution was then compared. As can be seen from Figures 4 and 5, four regions 
of local minima were found in the conformational space and energy barriers that 
exist between these four local minima, as shown by the low conformational 
density in the contour maps. Four regions are defined as region I: 75 degrees 
around (-60,105); region II: 45 degrees around (45,60); region III: 50 degrees 
15 around (-60,-60); IV: 50 degrees around (60,-100). The transitions between the 
four regions were counted for both simulations and the results are also 
summarized in Table II. It is clear that more transitions between the four regions 
occurred in the self-guided MD simulation than in the conventional MD 
simulation. 

20 Since each local minimum only covers certain range of (J) and i|i angles, 

another way to examine the frequency of transitions between local minima is to 
examine the time profiles of (j) and ijr, of which are plotted in Figures 6(a) for <J> 
and 6(b) for i(r, respectively. As can be seen from Figure 6(a), more frequent 
transitions occurred for the 4> angle during the self-guided MD simulation than 

25 during the conventional MD simulation and from Figure 6(b), similar results were 
obtained for the i|f angle. These results clearly showed that the guiding force in the 
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self-guided MD simulation method improves the conformational searching 
efficiency for the alanine dipeptide in the aqueous solution. 
(B) The 16-Residue Synthetic Peptide System 

When a system is large, the conformational space of the system becomes 
5 more complicated thus resulting during an MD simulation in the system being 
easily trapped in one local minimum or another. The proper sampling of the entire 
conformational space and the identification of the global minimum of a large 
system becomes an extremely difficult or even an impossible task to accomplish 
using the conventional MD simulation method at physically moderate 

10 temperatures. In this example, in order to illustrate the advantages provided by the 
self-guided molecular dynamics simulation for large systems, simulation studies 
on the synthetic 16-residue alanine peptide both in vacuum and in explicit water 
were conducted. This peptide is neutral, water soluble and has been determined 
to have ca. 50% helix secondary structure in aqueous solution. 

15 1. Simulations in vacuum 

Two comparative 10,000 ps simulations were performed at 300 K using the 
conventional MD and the self-guided MD simulation methods using the AMBER 
force field. In the self-guided MD simulation, A of 0.1 and t, of 0.2 ps were 
employed. Trajectories were recorded every 10 ps during the simulations for 

20 analysis. 

In the conventional MD simulation starting from a fully extended 
conformation, the peptide collapsed to a compact structure within 1 00 ps (Figure 
7). Continued simulation to 10000 ps did not result in much conformational 
change, indicating that the peptide was trapped in a local minimum. In the self- 
25 guided MD simulation at 300 K starting from the same fully extended 
conformation, the peptide folded into a helix structure within 100 ps (Figure 8). 
With continued simulation to 10,000 ps, significant conformational changes 
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occurred in local regions of the peptide but overall the peptide maintained the 
helix structure. 

To further investigate the ability to overcome energy barriers, another self- 
guided MD simulation was performed starting from the "trapped structure" of the 
5 peptide obtained at the end of the 1 0,000 ps of conventional MD simulation. It was 
found that within 1,000 ps, the peptide folded into a complete helix structure 
(Figure 9). The helical structure obtained is similar to that obtained from the self- 
guided MD simulation starting from the fully extended conformation (Figure 7). 
Continued self-guided MD simulation to 10,000 ps showed that the peptide 

10 underwent some conformational changes in local regions but overall, the helix 
structure was maintained. 

The potential energy of the peptide in these three simulations is plotted 
against the simulation time in Figure 10. As can be seen in the figure, in the 
conventional MD simulation, for the first 100 ps, the potential energy of the 

15 peptide decreased quickly and reached -505 kcal/mol. After 100 ps, the potential 
energy of the peptide fluctuated between -505 and —515 kcal/mol throughout the 
remaining period of the 10,000 ps simulation. In the self-guided MD simulation 
starting from the extended conformation, the potential energy of the peptide 
reached -520 kcal/mol and the complete helix formed within 100 ps. The 

20 potential energy of the peptide fluctuated between —520 and —530 kcal/mol 
throughout the remaining period of the 10,000 ps simulation. In the self-guided 
MD simulation starting from "the trapped structure" obtained from the 
conventional MD simulation, it was found that the potential energy of the peptide 
first went up to —495 kcal/mol from -506 kcal to overcome a large energy barrier 

25 then decreased quickly to -505 kcal/mol. At 500 ps, the potential energy went up 
again to -494 kcal/mol to overcome another large energy barrier and then 
decreased to -522 kcal/mol at ca. 1 ,000 ps simulation, when a complete helix was 
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formed. After this point, the potential energy of the peptide fluctuated between 
-520 and -530 kcal/mol, similar to the self-guided MD simulation starting from 
the fully extended conformation. From Figure 10, it can be seen that the potential 
energy jumps are much more frequent in the self-guided MD simulations than in 
5 the conventional MD simulation, which indicates more efficient conformational 
sampling by the self-guided molecular dynamics sirnulaiton. 
2. Simulations in aqueous solution 

Folding of the 16-mer peptide in aqueous solution through MD simulations 
would be difficult to achieve because the systematic conformational change of the 

10 peptide in aqueous solution is very slow due to the strong and random collisions 
of the peptide with its surrounding solvent molecules. Helix folding has been the 
subject of many simulation studies. As a result, de novo helix folding of large 
peptides in explicit water through MD simulations, is very hard to achieve, 
presumably because the systematic conformational change of large peptides in 

1 5 explicit water is very slow. 

In illustrating the ability of the self-guided MD simulations according to the 
invention to describe conformational change in large molecular systems including 
explicit molecules, a total of 1 879 TIP3P water molecules were used to solvate the 
16-mer peptide in its extended conformation (Figure 11) and the density of the 

20 system was set to be 1 .0 g/cm 3 . The peptide was placed along the diagonal of the 
cube to minimize the box size. At the beginning of the simulation, the closest 
distance between the peptide and its images was 7.8 A. As the simulation 
proceeded, the peptide began to fold and the closest distance between the peptide 
and its images became bigger. Overall, the interactions between the peptide and 

25 its images are not significant for the system. 

Comparative, 200 ps simulations were performed at 300 K with both the 
CHARMM and AMBER force fields using both the conventional MD and the self- 
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guided MD methods. In the self-guided MD simulation, X of 0.5 and t, of 2 ps 
were used. Each of these 200 ps simulations took ca. 80 CPU hours on the SGI 
indigo2 system with a single R10000 processor. 

During the 200 ps conventional MD simulations using either the CHARMM 

5 or AMBER force fields, no significant conformational changes occurred for the 
peptide. Snapshots of the structure of the peptide during the 200 ps conventional 
MD simulation using the CHARMM force field are shown in Figure 12. The 
structure of the peptide was merely relaxed and its N-terminal end was bent 
slightly. To observe significant conformational changes of the peptide, much 

10 longer simulation time is required. 

In contrast, significant changes in the conformation of the peptide were 
observed during the 200 ps self-guided MD simulation using either the CHARMM 
or AMBER force fields. 

In order to illustrate the advantages of the self-guided MD technique, a 

15 detailed analysis of the conformational change of the peptide during the 200 ps 
self-guided MD simulation using the CHARMM force field is provided as follows. 
At 1 8 ps, the first hydrogen bond between the carbonyl group (CO) of A*, (A^ 
refers to Alanine residue at the 9th position in the sequence) and the amido group 
(NH) of A 12 was formed. At 20 ps, another hydrogen bond between the CO of A 7 

20 and the NH of A, 0 was formed and at 22 ps, the first a -helix turn was formed 
between residues 8 and 12. The formation of this a-helix turn promoted a 
hydrogen bond between the CO of A 10 and the NH of Q l3 at 24 ps. At 26 ps, a 3 10 
helix segment was formed from residues 1 1 to 14. This helix segment propagated 
toward the C-terminal and at 40 ps, the helix segment extended to residues from 

25 8 to 16. This long helix segment began to break at 44 ps from the C-terminal and 
the helix propagated toward N-terminal at 50 ps. At 66 ps, a hydrogen bond was 
formed between the CO of A, and the NH of A 5 . At 70 ps, the helix segment 
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shifted to residues from 5 to 1 1 . At 74 ps, a 3 10 helix was formed between residues 
3 and 6 and another helix segment was formed from residues 6 to 12. At 80 ps, 
a long helix segment was formed from residues 1 to 13. At 86 ps, a helix was 
formed throughout the entire peptide with a slight distortion at its C-terminal. The 
helix structure was broken shortly after 100 ps into two helix segments, one from 
residues 1 to 7 and another one from residues 8 to 16. The conformational changes 
of the peptide continued to the end of the 200 ps simulation. A number of 
snapshots are shown in Figure 13. The results obtained from the 200 ps self- 
guided MD simulation using the CHARMM force field are consistent with the 
experimental results that showed that this peptide has ca* 50% helix structure in 
aqueous solution. 

During the 200 ps self-guided MD simulation using the AMBER force field, 
the 16-residue peptide also folded into variety of helical segments, but the 
structural details differed from those obtained using the CHARMM force field. 
A number of snapshots of the structure of the peptide during the self-guided MD 
simulation using the AMBER force field are shown in Figure 14. The analysis of 
the conformations of the peptide during the simulation using the AMBER force 
field showed that the helix form is the dominant structural element of the peptide, 
again consistent with the experimental results. 

Taken together, the comparative simulations on the 1 6-mer peptide in 
aqueous solution using the conventional MD and the self-guided MD methods 
clearly show that the self-guided MD simulation method provides a dramatic 
improvement in conformational searching efficiency. It is important to point out 
that the improvement in conformational searching efficiency does not depend 
upon the force field used in the simulation, although the structural details clearly 
depend upon the force field used in the simulation. Our results indicate that it is 



WO 00/10067 



PCT/US99/18302 



-45- 

feasible to study the folding dynamics of relatively large peptide systems in 
explicit water using the self-guided MD simulation method of the invention. 

EXAMPLE 2 

This example illustrates the advantages provided by the self-guided MD 
5 technique of the invention in analyzing folding mechanisms. One important step 
in protein folding processes is the formation of segments of secondary structure, 
such as a-helical segments. Therefore, analyzing the mechanisms associated with 
the formation of such secondary structures is an important step towards designing 
computational techniques capable of predicting protein territory structures. 
10 Helix is a rudimentary secondary structure in proteins and helix formation 

has been found to play an important role in protein folding, which remains one of 
the most challenging and difficult problems in biological science. Elucidation of 
helix folding mechanism can therefore provide important insights into the protein- 
folding problem. 

15 In this example, we performed a 10-ns SGMD simulation of a 16-residue, 

alanine-based peptide in explicit waters using a set of SGMD guiding parameters 
as described in Example 1. In addition, we performed a 10 ns simulation of the 
peptide under the same simulation condition with conventional MD method. 
Furthermore, to investigate the effects of the guiding parameters to the helix 

20 folding in the SGMD simulations, we performed another 10 ns SGMD simulation 
with different guiding parameters. It was found that both 10 ns SGMD simulations 
were able to achieve reversible helix folding and that the 10 ns conventional MD 
simulation was not sufficiently long for the formation of a single helix segment. 
Analysis of the SGMD simulation trajectories provides us the unprecedented 

25 opportunity to study the helix folding mechanism in water. The simulation 
conditions and results are discussed below. 
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Simulationt Setup 

In this example, the peptide aqueous solution was constructed by immersing 
the 16-mer peptide molecule in its fully extended conformation in a box of 1945 
TIP3P waters and deleting overlapping water molecules that were within 2-45 A 

5 of the peptide. Cubic periodic boundary conditions were applied in the simulation 
to eliminate the boundary effect and a SHAKE algorithm was used to fix all bond 
lengths during the simulation such that large time step of 0.002 ps can be 
employed. NTP simulations were performed (T=274K and P= 1 atm). 
Temperature and pressure are controlled using conventional algorithms as 

10 described above. The temperature was coupled to a temperature bath with a 
coupling constant of 1 .0 ps, and the pressure coupling constant was set to be 1 .0 
ps. Conventional MD simulation was performed using the AMBER program, 
while SGMD simulations were performed using the SGMD method implemented 
in the AMBER program. AMBER94 all atom force field (Cornell et al. 1995) was 

1 5 used to represent the peptide. A constant dielectric constant of 1 was used for the 
electrostatic interaction calculations. The cut-off value for non-bonded 
interactions, including Lennard- Jones and electrostatic interaction was set at 10 
A based on residues. 
Secondary Structure Analysis 

20 To provide a detailed description of the secondary structure of the peptide 

during the simulation, the dictionary of secondary structure of protein (DSSP) 
proposed by Kabsch & Sander was employed to analyze the secondary structural 
elements of the peptide conformation. According to the DSSP definition, 
cooperative secondary structure is recognized as repeats of the elementary 

25 hydrogen-bonding patterns "turn" and "bridge". Repeating turns are "helices", 
repeating bridges are "ladders", connected ladders are "sheets". Geometric 
structure is defined in terms of the concepts of torsion and curvature of differential 
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geometry. Local chain "chirality" is the torsion handedness of four consecutive 
Ca positions and is positive for right-handed helices and negative for ideal twisted 
b-sheets. Curved pieces are defined as "bends". For the 16-residue alanine-based 
peptide, the two terminal blocking groups, the acetyl at the N-terminal and the 

5 amide at the C-terminal, are counted as separate amino acids when defined the 
secondary structure of residues 1 to 16. 
A. Conventional MP Simulation 

We have performed a 1 0-ns conventional MD simulation on the peptide 
system. Analysis of the simulation trajectory showed that no helix segment was 

10 formed during the entire 10 ns simulation. In Figure 15, the gyration radius was 
plotted against the simulation time. As can be seen, the lowest gyration radius 
reached during the simulation is around 9 A. SGMD simulations showed that 
when substantial helix segments are formed within the peptide, the gyration radius 
decreases to a value between 6 and 8 A. Thus, the conformational change of the 

15 peptide segments during the 10 ns conventional MD simulation is not large 
enough for the formation of helix segments and is certainly far from sufficient for 
reversible helix folding. Based upon the recent 1 ms simulation experiment 
performed in water by Duan and Kollman (1998), substantial helix segments are 
formed after 50 ns. It is thus expected that the helix formation of this alanine- 

20 based peptide in water might also occur in a similar time-scale. Using the 
conventional MD simulation method, simulation time in the order of hundreds of 
picoseconds may be required in describing reversible folding of the 16-mer 
peptide. Since each 10 MD ns simulation takes about 10 weeks of CPU time on 
an R 10000 SGI workstation, it will require substantial computing resource to 

25 perform a few hundred ns simulations at the present time. 
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B. The First 10 ns SGMD Simulation 

The first 10 ns SGMD simulation was performed with an averaging time 
t, equal to 2ps and a guiding factor X equal to 0.4. 

The gyration radius of the peptide was also plotted against the simulation 

5 time (Figure 16). As can be seen, the gyration radius reached 8 A within 100 ps. 
Thereafter, the gyration radius fluctuated between 6 and 9 A. Based upon the 
gyration radius change, SGMD simulation with this set of guiding parameters 
appears to accelerate the conformational change of the peptide by more than 100- 
fold as compared to conventional MD simulation. 

10 The Ramachandran plot obtained for the 16-mer peptide based upon the 

5000 conformations recorded during the 10-ns SGMD simulation is shown in 
Figure 16. As can be seen, the peptide has essentially accessed all the allowed 
conformational regions observed in protein structures, indicating that during the 
10-ns SGMD simulation, sufficiently large conformational space was sampled. 

1 5 The most dominant backbone conformation is around the «-helix region (-57, -47), 
followed by the 3 i0 -helix region (-50, -20). Interestingly, the left-handed «-helix 
region (57, 47) and the P-structure region (-140, 140) are also populated. A 
significant population is located in a large, random coiled conformational region 
(-30 to -100,0 to 180). 

20 To provide a detailed description of the "motion" of each residue of the 16- 

mer peptide during the simulation, we classified the 5000 conformations recorded 
during the simulation according (DSSP). The average ratio of each secondary 
structural element (<*-helix , 3 l0 -helix , K-helix , turn, bend, bridge and ladder) for 
each residue is calculated in the peptide based upon all the 5000 conformations 

25 recorded during the simulation and provided in Table III. 
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Table Til. The ratios of secondary structural elements observed for each 

amino acid in the peptide. 



Residue # 


Bend 


Bridge 


Ladder 


Turn 


3 10 -Helix 


a-Helix 


p-Helix 


All Helix 
(a+3, n +p) 


i 
j. 


0 


0 


o 


9 . 5 


0 . 6 


75 . 0 


0.5 


16,2 




9 . 3 


0 


0 


10 . 3 


0 - 9 


77 .4 


0.6 


78. 9 


•a 
j 


5 - 8 


0 . 2 


0 . 3 


8 . 0 


1 . 0 


78 . 7 


0 . 7 


80.5 


4 


4 . 0 


0 


0 . 3 


4 . 5 


0.6 


81 . 9 


0.7 


83 . 1 


5 


2 - 8 


0 . 1 


0 


7 . 6 


4 * 1 


82 . 4 


0.7 


87 . 1 


6 


5.5 


0 


0 


16.3 


5.8 


69.0 


0.3 


75.1 


1 


7.7 


0 . 1 


0.3 


18 . 6 


6.5 


62.3 


0.2 


69.0 


8 


10.5 


0.2 


0.3 


20.2 


5.4 


50.1 


0.7 


56 . 2 


9 


15.0 


0 


0 


7.9 


2.6 


45.9 


0.7 


49. 1 


10 


7 .4 


0 


0 


10 - 0 


2.2 


71.0 


0.7 


73.9 


11 


4.5 


0 


0 


22 .2 


3.9 


66. 7 


0.7 


71.3 


12 


7.4 


0 


0 


23 . 9 


5.3 


55.4 


0.6 


61.3 


13 


9.9 


0 


0 


19-4 


7.1 


50.7 


0 . 1 


57.9 


14 


13 .5 


0 


0 


25 . 5 


7.7 


45.5 


0 . 1 


53 .3 


15 


23-2 


0 


0 


25 . 0 


4.6 


33 .4 


0 . 0 


38 . 0 


16 


0 


0 


0 


25.5 


2.1 


10 .6 


0 . 0 


12 .7 


Avg . 


7 . 9 


0 


0 . 1 


15-9 


3.8 


59.8 


0.5 


64 . 0 



As can be seen from Table III, on average, helix secondary structure, 
including 3 10 -, and Tt-helix, is the dominant form for the peptide and the 
average helix ratio is 64%. Although the peptide may have not sampled all 

25 possible configurations during the 10 ns SGMD simulation, it is noted that the 
average helix ratio obtained from the simulation is comparable to the experimental 
helix ratio of 50%, as estimated from CD experiments (Scholtz, et al., 1991). 
Among the three different helix elements, ~-helix is the dominant form (59.8%) 
and there is a significant population of the 3i 0 -helix (3.8%). The least populated 

30 helix form for this peptide is u-helix (0.5%). Our results thus clearly show that 
the «-helix conformation is much more populated than the 3 10 -helix conformation. 
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This is consistent with a previous study for this peptide system, which showed that 
«-helix is more stable than 3 10 -helix in aqueous solution. 

A number of recent experimental studies using double spin label electron 
spin resonance (ESR) have suggested that a significant population of the 3 10 -helix 

5 conformation form may exist for short alanine-based peptides. Based upon our 
simulation studies, it is clear that there is indeed a significant population of the 3 J0 - 
helix conformation (3.8%) for this peptide in solvated form. Thus, the UK 
simulation provides direct evidence for the existence of the 3 10 -helix for alanine- 
based peptides in water. It is of note that the 3 l0 -helix conformation was mainly 

10 observed for two segments, residues 5 to 8 in the middle of the peptide and 
residues 1 1 to 15 near the C-terminus. Using a double spin label ESR technique, 
Fiori et ah have found that there is a strong evidence of 3 10 -helix geometry near 
the C-terminus for 16- and 21-mer alanine-based peptides and our results are in 
good agreement with those experimental observations. 

15 "Turn" is the next most populated secondary structural element. This 

structural element is believed to play an important role in helix initiation and 
propagation. Indeed, the self-guided MD simulation conducted according to the 
invention show that 87.5% of helix transitions (folding and unfolding) are between 
helices and turns, confirming the important role played by turns in helix folding. 

20 We observed a small population of bridges and p-strands, which mainly involve 
residues 3, 4, 7, and 8. Their low populations suggested that they are transient 
states under the simulation conditions. Left-handed helix was also observed from 
6130 ps to 6560 ps. located at the C-terminus. 

It is of interest to note that the helix ratio for each residue depends upon not 

25 only the nature of the residues (Ala, Tyr, Gin) but also the location of the residues. 
Very low helix ratios were observed for the C-terminal residues 15 and 16, being 
only 38.0% and 12.7%, respectively. From the N- to the C-terminal, two peaks of 
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the helix ratio were found, located at residues 5 and 10 and separated by residue 
9. To further investigate the helix folding of the peptide during the simulation, for 
each conformation, each residue was classified as helical or non-helical according 
to the DSSP. The helix content of the peptide for each conformation was plotted 

5 against the simulation time (Figure 1 7). The helix segments were formed within 
the first 100 ps of the simulation. During the 10 ns SGMD simulation, helix 
folding, unfolding and refolding occur frequently, suggesting that the peptide 
structure was not trapped in a particular conformation. Figure 1 8, shows the total 
number of helical residues for each conformation against simulation time. As can 

1 0 be seen, within the first 500 ps, the peptide formed almost a complete helix, except 
for one residue break in the middle of the peptide. The peptide then partially 
unfolded around 1000 ps and refolded thereafter. At 2400 ps, the peptide 
completely unfolded. It refolded again into almost a complete helix with a frayed 
C-terminus around 3000 ps. At 4000 ps, the peptide completely unfolded again but 

1 5 refolded into a complete helix with a frayed C-terminus at 5200 ps. During the 10- 
ns SGMD simulation, the peptide completely unfolded 4 times. These data clearly 
showed that the reversible folding (folding, unfolding and refolding) of the peptide 
is achieved during the 10 ns SGMD simulation. 

Based upon our simulation, it is clear that the peptide doesn't simply adopt 

20 either a complete helix or random coil conformations but has much rich 
conformational forms. We analyzed the major conformational clusters according 
to helix (H) and coil (C) definition. A total of 9 major conformational clusters 
were identified, as shown in Figure 19. As can be seen, the complete helix 
conformation only accounts for only 0.1% of the total number of the 

25 conformations. The coiled conformation (C) also accounts for 0.8% of the 
conformations. The most populated conformational cluster (HC, 38%) has a helix 
segment at the N-terminal and a frayed C-terminal. The next two most populated 
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conformational clusters have either one (CHC) or two helix segments (CHCHC) 
in the middle of the peptide and with frayed N- and C-terminals, accounting for 
8.9% and 7.9% of the total conformations, respectively. The next most populated 
conformation (HCH) has two helix segments at the N- and C-terminal and with a 
break in the middle of the peptide (5%). The conformation with a C-terminal helix 
segment and a frayed N-terminal (CH) accounts for 4% of the total conformations. 
The conformation with a C-terminal helix segment, followed by a coiled segment, 
a helix segment near the N-terminal and a frayed N-terminal (CHCH) accounts for 
about 2% of the total population. Lastly, the conformation with a helix segment 



10 at the N-terminal, followed by a coiled segment, a small helix segment and a 
L 2 frayed C-terminal (HCHC), accounts for less than 1% of the total conformations. 

uJ Our results thus showed that the alanine-based peptide adopts multiple 

conformations and the complete helix conformation only accounts for a very small 
percentage of total conformations. Accordingly, the experimentally observed 50% 
p 15 helix content for this alanine-based peptide should represent an average of several 

major conformational forms, each of which has one or more helix segments. 

The self-guided MD trajectory was further analyzed as follows. Helix 
initiation starting from the fully extended conformation was first examined. Figure 
20 shows the secondary structure evolvement during the first 400 ps of the 
20 simulation. As can be seen in the figure, starting from the fully extended 
conformation, the peptide first forms bends and turns. Thereafter, some turn 
structures further evolved into 3 10 -helix segments. Segments of 3, 0 -helix can 
either transfer back to turns or further evolve and form ^-helices. 

In order to show that more precise understanding of the helix initiation 
25 mechanism can be gained through self-guided MD simulation, the total number 
of transitions between different secondary structural elements were analyzed. 
During the 10 ns simulation, there were a total of 250 transitions between 3 10 -helix 
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segments and turns, 150 transitions between «-helix segments and turns, 17 
transitions between 7t-helix segments and turns and 120 transitions between 3 10 - 
helix and ~-helix. There were only 9 transitions between bends and 3, 0 -helix 
segments. These results thus show that turns play an important role in helix 
5 initiation. Furthermore, our results indicate that there are more frequent transitions 
between turns and 3 10 -helices than between turns and «-helices, although «-helix 
is a much more populated secondary structural element than 3 10 -helix for this 
peptide during the simulation. However, there are also frequent and direct 
transitions between <*-helix and turns. Therefore, starting from turn conformations, 
1 0 «-helix can be formed directly or through the 3 i0 helix. Based upon the foregoing 
results, a general helix initiation mechanism is proposed, as shown in Figure 21. 

Previously, Millhauser proposed a helix folding mechanism based upon 
experimental data obtained from X-ray crystallography, CD, NMR and ESR and 
theoretical and simulations studies (Millhauser, 1995). The proposed key 
15 initiation mechanism show in Figure 21 is in excellent agreement with a 
mechanism based on X-ray crystallograph, CD, NMR and ESR experimental data 
as proposed by Millhauser. It is of note that the nascent helix conformation in 
Millhauser' s helix folding mechanism resembles closely the turn conformation 
identified through the self-guided MD simulation of the invention. In both models, 
20 nascent helices or turns were formed first from random coiled conformations. The 
nascent helix further evolves into 3 10 -helix, which can subsequently form «-helix. 
Both models identified the important roles played the nascent helix (turn) and the 
3 10 -helix in the folding of ~-helix. One important extension in our helix folding 
model to Millhauser's helix folding model is that although turns (nascent helices) 
25 more readily fold into 3 10 -helices, a significant percentage of turns conformations 
can fold into «-helices without going through the 3 I0 -helice as the intermediate. 
C. The Second 10-ns SGMD Simulation with Different Guiding 
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Parameters 

The impact of the guided motion parameters on the results of a self-guided 
MD simulation were investigated as follows: 

A 10-ns SGMD simulation of the 16-mer peptide based on a different set 
5 of guiding parameters (an averaging time t,=2 ps and a guiding factor X=03) was 
carried. 

In Figure 22, the gyration radius and the total number of helical residues are 
plotted against the simulation time. Based upon the rate of the gyration radius 
change, it is clear that the conformational change of the peptide during this SGMD 

10 simulation with a smaller guiding factor X of 0.3 is significantly slower than that 
during the SGMD simulation with a guiding X factor of 0.4, but much faster than 
the conformational change of the conventional MD simulation. Based upon the 
total number of helical residues, the peptide formed substantially «-helix at 3500 
ps and reached almost a complete helix at 4000 ps, similar to the conformation 

15 observed at 430 ps of the SGMD simulation with a guiding factor X of 0.4. During 
the 10 ns SGMD simulation with a guiding factor X of 0.3, folding, partial 
unfolding and refolding were observed. However, the peptide didn't experience 
complete unfolding, unlike that observed during the 10 ns of the SGMD 
simulation with a guiding factor of 0.4. The helix profile of the 10-ns SGMD 

20 simulation with a guiding factor X of 0.3 is very similar to that of the first 2-ns 
SGMD simulation with a guiding factor X of 0.4. Nevertheless, both SGMD 
simulations showed that the peptide conformation is predominantly helix in 
explicit water, consistent with experimental results. 

During the 10-ns SGMD simulation with a guiding factor X of 0.3, helix 

25 folding, partial unfolding and refolding occurred many times and this provided 
another opportunity to examine the helix initiation mechanism and to validate the 
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results obtained from the 10-ns SGMD simulation with a guiding factor k of 0.4. 
It was found that again turns play an important role to initiate helices (oc-helix, 3 10 - 
helix and u-helix). During the 10-ns SGMD simulations, there were a total of 176 
transitions between turns and 3 !0 -helices, 112 transitions between turns and «- 
5 helices and 36 transitions between turns and 7i-helices. For transitions involving 
«-helix, there were 112 transitions between turns and <*-helices, 41 transitions 
between 3 10 -helices and ^-helices, and 4 transitions between rc-helices and <*- 
helices. These results show that a-helices are mainly initiated through the 
formation of turns. However, turns are more easily transferred to 3 10 -helix 
10 segments and 3 10 -helices also played an important role in the folding and 
unfolding of «-helices. 

Comparison of the two SGMD simulations with different guiding 
parameters showed that with respect to the helix initiation, although the absolute 
number of transitions between turns, «-helices, 3 10 -helices and 7t-helices differ 
1 5 between these two simulations, both simulations clearly indicate that turns play a 
dominant role in the formation of helices («-, 3 l0 - and n -helices). Moreover, both 
simulations clearly show that turns most likely transfer to 3 10 -helices, while least 
likely transferring to 7i-helices among different forms of helices. While most ~- 
helices are directly formed from turns, a significant percentage of «-helices are 
20 initiated from 3, 0 -helices. Furthermore, both simulations indicated that helix is the 
dominant conformational form of this alanine-based peptide in aqueous solution, 
consistent with experimental results. 
EXAMPLE 3 

In order to further examine the impact of the guiding parameters on a self- 
25 guided molecular dynamics simulation, the following computer experiments were 
performed on a system including the peptide YPGDV. The chemical structure of 
YPGDV is shown in Figure 23. The peptide was solvated using 805 TIP3P water 
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molecules in a cubic box and the box size was set to be 29A. Cubic periodic 

boundary conditions were applied in all simulations. The CHARMM22 force field 

was employed. The nonbonded interactions were smoothly truncated. The cutoff 

distance for non-bonded interactions was set at 13 A and a switch function was 

5 applied between 8 to 12 A to smooth the change across the cutoff. A neighbor list 

was built and updated every 10 MD steps for non-bonded interaction calculation. 

The simulation temperature was set at 300 K. The SHAKE algorithm was 

employed to constrain all bonds so that a time step of 0.002 ps can be utilized in 

the simulations. The trajectories were recorded every 2 ps for later analysis. For 

10 the SGMD simulations, the local averaging time t, is set at be 2 ps. To examine 

the guiding effect on conformational search, five SGMD simulations were 

performed with the guiding factor A,=0.1, 0.2, 0.3, 0.4, and 0.5, respectively. 

1. Conformational space searched during 2 ns MD and SGMD 
simulations 

15 In a 2 ns MD simulation starting from an extended conformation, the 

peptide experienced very slow conformational change. The water molecules 
surrounding the peptide formed a hydration shell presenting large energy barriers 
that prevent certain conformational transition and exerts damping effects on the 
movements of the peptide. A Ramachandran plot of (j> and Y dihedral angles 

20 around each peptide bond separately is shown in Figure 23. As can be seen, of the 
eight dihedral angles, only <J> 3 and <f> 4 angles reached their whole range, while other 
angles just a oscillated around their starting region. Figure 24 shows some typical 
conformations obtained during the simulation. Starting from extended 
conformation (0 ps), the peptide relaxed to coil conformations such as the one at 

25 370 ps with bends on it. It evolved into conformations with a type I turn like 
strucure over residues 2 to 5. After about half a nanosecond, this type I turn like 
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structure dispeared and the peptide remained in a coiled conformation for the rest 
of the simulation. 

In contrast, in a 2 ns SGMD simulation with t,=2 ps and X=0.5, much wide 
regions have been searched (Figure 25). In addition to the regions reached in the 
5 MD simulation, f 3 found its high population region around 90°. 4) 4 and (J) 5 reached 
the region (0,90). Figure 26 shows some typical conformations obtained in the 
simulation. From the same extended conformation, the peptide quickly bended at 
residues 2 and 3 (at 32 ps) and formed a type II turn structure on residues 1-4 (34 
ps). This type II turn structure is a fairly stable one and remained for the most of 
10 the simulation period. The conformational change was then localized in the C- 
terminus, which reached out to interact with water (120 ps) or folded back to 
interact with the N-terminus region (304 ps). 

2. Acceleration of Conformational search in SGMD simulations 
A series of 2 ns SGMD simulations with different guiding factors, A=0.0, 
15 0.1, 0.2, 0.3, 0.4, and 0.5. Here, X=0.0 corresponding to the MD simulation. With 
these different guiding factor simulations, we can see the conformational space 
explored increases to a limit, and the conformational transition reach a reversible 
process. 

To describe the conformational space accessed in these simulations the 
20 conformational clustering technique presented above is employed. Figure 26 
shows the center conformation of nine major clusters observed in these 
simulations. The backbone <t>, t|r dihedral angles are listed in Table IV. The 
conformational transition among clusters during these simulations are shown in 
Figure 27. 
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did not result in reversible conformational change. That is, the peptide did not 
have repeated and stable visits to a cluster. At A,=0.4, repeat and stable 
conformational transitions between cluster II and III were observed. When 1=0.5, 
the peptide transferred to cluster IV after 32 ps and stayed there for about 100 ps 
5 before transferring to cluster I. 

The above results show that for SGMD simulations to provide enhanced search 
(A.=0.3, 0.4, and 0.5) should be employed. 

3. Comparison with NMR experimental results 

The solution conformation of the pentapeptide, YPGDV, has been extensively 

10 studied using rotating frame NOESY(ROESY) NMR techniques. Their study 
showed that the Asp4 amide proton temperature coefficient for this peptide was 
only 3.3 ppb/K, considerably lower than that for other pentapeptides investigated 
in their study. This suggested that a high population of P-turn in solution might 
exist. Indeed, the NOE that was observed for this peptide confirmed its P-turn 

15 conformation in solution. For example, a strong NOE between the Gly 3 and Asp 4 
amide protons was observed. The sequential cross peak between the NH of Asp 4 
and the C«H of Gly 3 is reduced in intensity as compared to that between the Asp 4 
CaH and Val 5 NH and between Proj CaH and Gly 3 . Of great importance, a cross 
peak between the CaH resonance of Pro 2 and the NH resonance of Asp 4 was 

20 observed. Taken together, the very low Asp4 NH temperature coefficient and the 
NN(3,4) and <*N(2,4) NOEs provide unequivocal evidence for a relatively high 
population of a 1-4 hydrogen bonded P-turn in this peptide. Formation of P-turn 
by this peptide was also confirmed independently by circular dichroism. The 
presence of a significant population in water solutions of this peptide is indicated 

25 by the positive ellipticity with a maximum near 206 nm and a minimum near 190 
nm.. 
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In the MD simulation (A,=0.0) starting from an extended conformation which 
belong to the coil cluster (VI), the peptide transferred to cluster VII, which has a 
type I-like turn structure on residues 2 to 5, after 150 ps and remained there for 
about 500 ps. After 630 ps, the peptide briefly visited cluster I region for a couple 
5 of times and back to cluster VI and remained there for the rest of period with 
occasional visits to cluster VII. In the MD simulation, starting from a type II turn 
structure (cluster I), the peptide remained in cluster I all the time except in two 
cases where it became close to cluster VII. The lack of transitions in this 
simulation indicates than cluster I is likely a stable state for this peptide. 

1 0 When A,=0. 1 , the acceleration is not sufficient to make the peptide reach other 
important clusters in the 2 ns simulation. The peptide only briefly visited cluster 
VII and other clusters. A brief visit means that the peptide takes a conformation 
more similar to that in the cluster, but lack of stabilizing factors, such as hydrogen 
bonds and hydrophobic packing results in the peptide moving away from said 

15 cluster. At about 700 ps, the peptide transferred to cluster V, in which the 
terminus interaction induces two turn structures, one between residues 1-4 and the 
other one between residues 2-5. Residues 2, 3, and 4 formed a 3 10 helix segment. 
In the rest simulation, the peptide remained in this cluster. When X=0.3, the 
peptide reached cluster IX after 30 ps of simulation on time and remained there for 

20 more than 1 00 ps before transferring to cluster IV. 

When A.=0.4 the peptide transferred to cluster X and stayed there for about 1 00 
ps, then changed to cluster I at about 200 ps. In the next 600 ps, it stayed in 
cluster I with some brief visits to other clusters. After 900 ps the peptide 
transferred to cluster II and then to cluster III, and transfered between the two 

25 clusters during the rest of the period. From Table V we can see clusters I-IV all 
have a type II reverse turn on residues 1 to 4. But their C-termini have different 
positions relative to the turn structure. In simulations with A.<0.5, 2 ns simulation 
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First, the central conformations obtained by SGMD were compared with the 
NMR experimental results. As can be seen from Figure 26, clusters I-IV have a 
strong 1-4 hydrogen to bond between the carbonyl group of Tyr! and the amide 
group of Asp 4 , in good agreement with the low temperature coefficient of the 
5 amide protein of Asp 4 and the NOE between 2H« and 4HN measured in the NMR 
experiment. 

In NMR experiments, NOE represents average structural information in 
solution, rather than a specific conformation. To provide a more vigorous 
validation, the average NOEs were estimated from the SGMD conformations. 
1 0 Using the 1 000 conformations recorded during the 2 ns SGMD simulation with 
A=0.5, the ensemble average NOE effects between each relevant proton pair are 
calculated and listed in Table VI. As can be seen, the agreement between SGMD 
simulation results and the experimental NOEs is remarkable. 
EXAMPLE 4 

15 This example illustrates the simulation of a phase transition by self-guided 

molecular dynamics according to the invention. 

The SGMD simulation method was applied to L-J argon systems to examine 
its behavior and to demonstrate its ability in conformational search. 

A total of 500 argon atoms were enclosed in a cubic box and the cubic periodic 

20 boundary conditions were applied to the system except otherwise noted. The initial 
crystal structure was created by putting argon atoms at face-centered-cubic (fee) 
lattices, and the liquid structure was obtained by heating the crystal to 120 K and 
equilibrating for 1000 ps. A neighbor list was created for interaction calculations 
and updated every 10 MD time steps. A time step of 0.01 ps was employed in all 

25 our simulations. The cut-off distance for atom-atom interaction was set at 14.0 A. 
A switch function was applied between 10 A and 1 2 A to smooth the interaction 
change across the cutoff. 
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A system of 500 argon atoms is used in our comparative simulations as shown 
in Figure 28 tetragonal periodic boundary condition with sides a=b=28.53 A and 
c=57.06 A was applied to the system. Such a boundary condition makes the 
system look like films of condense phase separated by gas layers. Such a design 

5 allows the volume the system to change freely upon crystallization. The starting 
structure was created by melting a fee crystal at 120 K, cooling down, and being 
equilibrated at 60 K. The equilibrated argon liquid film was simulated using the 
conventional MD method and the SGMD method at tf=0.2 ps and X=0.02, 0.05, 
and 0.1. In the conventional MD simulation, it took 65 ns before crystallization 

10 occurred. While in the SGMD simulations, the crystallization occurred at 63, 2, 
and 0.5 ns with A=0.02, 0.05, and 0.1, respectively. Figure 29 shows the potential 
energy changes during these simulations. Phase transitions are evident by the 
sharp decline in potential energy. 

For the crystallization of argon liquid, the SGMD simulations result in a 

15 dramatic acceleration in searching out the low free energy states as compared to 
conventional MD simulations. 
Example 5 

The present example illustrate employing the method of the invention in 
determining the binding modes of benzyl alcohol with -cyclodextin in solution. 
20 This system models complexation of -cyclodextin with aromatic guests in general 
and also provides a model for aromatic-alkyl alcohol guests, which have been 
widely used as antiseptics. 

Two Self-guided Molecular Dynamics (SGMD) simulation runs were 
performed on the -cyclodextin - benzyl alcohol system, simulation I, 1 .5 ns and 
25 simulation II, 1.0 ns long. Both simulations used the same conditions: the time 
step was 0.002 ps, the temperature 300 K; non-bonding cutoff was 14 Angstrom 
and a smoothing function was applied between 8.0 and 12.0 Angstroms. The 
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dielectric constant was set to one, the frequency of updating the non-bonding pair 
list was every 20 steps and the self-guiding force was used with 0.1 scaling factor 
and 0.2 ps averaging time. The system was solvated using periodic boundary 
conditions (TIP3 water) with box size 35.0 Angstrom in each directions. The 
5 Charmm22 force field was used. 

In the starting conformation six benzyl alcohol molecules were placed 
around cyclodextrin. In the case of simulation I, four benzyl alcohol molecules 
surrounded the rim of the cyclodextrin 's macro-ring, two molecules being 'above' 
and 'below' of the host's cavity. In the case of simulation II all six benzyl alcohol 

10 molecules were placed around the rim, approximately in the plane of the 
glycosidic oxygens. In all cases, the aromatic ring of the benzyl alcohols were 
oriented approximately parallel to the latter plane. 

The conformations were saved every ps in each simulation and then 
grouped into clusters. In simulation II there are two benzyl alcohol guest 

15 molecules bound to cyclodextrin in different regions of the trajectory. 
Consequently the conformations are considered twice, with respect to the first 
guest and the second guest case yielding a total of 3500 conformations. 

The structures were clustered based on the following criteria: (a) center of 
mass distance between host and guest considering heavy atoms only; (b) the 

20 number of hydrogen bonds between the benzyl alcohol guest and -cyclodextrin's 
secondary hydroxyls or (c) water molecules; (d) the distance between the center 
of mass of the oxygens in the secondary hydroxyls and the polar end of the guest. 
Here the polar end was the center of mass of the oxygen in benzyl alcohol and the 
carbon atom bonded to; (e) the angle between the plane of the glycosidic oxygens 

25 and that of the guest's phenyl ring. 

The benzyl alcohol molecules in the starting conformation are distant from 
the host's cavity as shown in Figure 30. During the SGMD simulations, reversible 
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guest binding as well as competitive binding of two benzyl alcohol molecules 
were observed. Since host-guest unbinding and binding occurs in both simulations 
several times, it was likely that the length of these simulations was sufficient for 
adequate sampling of the conformational space of the complex. 
5 The conformations collected during simulation I and II are grouped into 

clusters in order to determine favored and less important binding modes. One of 
the major binding modes shows high similarity with that observed in the available 
crystal structure, as discussed below. Also there are other features of benzyl 
alcohol binding to cyclodextrin characteristic to the solution complex, to which 

10 SGMD conformational analysis was able to give detailed insight. 

There are two major binding modes identified from the simulation I, II 
trajectories while all other cluster conformations are much less represented. The 
two major clusters appear to share common characteristics as follows. The guest 
molecule does not hydrogen bond with the host's secondary hydroxyls, similarly 

15 to the conformation in the crystal structure. In the latter case the guest hydrogen 
bonds with other cyclodextrin molecules packed close in the crystal in a herring- 
bone like pattern. Even though conformations of the first cluster contain the guest 
fully inserted into the host's cavity, there was extensive hydrogen bonding 
between the complexed guest and the solvent similarly to the other major and 

20 overwhelmingly to most minor clusters. Interestingly, a minor third cluster 
exhibits the lowest non-bonding interaction energy with cyclodextrin most likely 
due to satisfying fully its hydrogen bonding potential through interactions with the 
host as well as with the solvent. However, hydrogen bonding between the host 
and guest does not necessarily lead to lowering of the energy of the complex due 

25 to likely loss of hydrogen bonding between the host and water molecules. 

The orientation of the benzyl alcohol in the major clusters was similar to 
that observed in the X-ray structure. The phenyl ring was not perpendicular to the 
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plane of the glycosidic oxygens; it forms an angle around 70°, which was very 
close to the value (69.6 °) observed in the crystal structure of the complex. 

The most important difference between the two major binding modes 
(major clusters) was the fact that the first cluster was more deeply inserted into 

5 -clyclodextrin' s cavity compared to the third cluster and hence it was closer to the 
degree of inclusion characteristic of the crystal structure. For comparison 
purposes, we calculated the center of mass distance between host and guest 
(heavy) atoms for the X-ray structure obtained from the Cambridge Structural 
Database (CSD). The latter distance was 0.53 Angstroms, comparable to the 

10 average value (0.77 ±0.28) of the first major cluster. 

Comparing the host's conformation between the two major cluster complex 
structures obtained in this example, there are no significant differences. In 
contrast with their overall averages, the average maximum and average minimum 
of the distances of the glycosidic oxygens from their center differ comparing the 

15 non-bonding clusters with the major two clusters, consistent with slight elliptical 
distortion of the clyclodextrin macro-ring upon complexation. 

Minor clusters obtained from the trajectories of simulation I and II contain 
non-bonding clusters, which represent conformations of clyclodextrin and benzyl 
alcohol while there was considerable distance between them. In additions to the 

20 case of one guest molecule complexed with the host (like the crystal structure), the 
simulation trajectories indicate competitive and reversible binding between 
clyclodextrin and benzyl alcohol. The distribution of the conformations among 
the clusters clearly show that the orientation of the benzyl alcohol with its 
hydroxyl group toward the host's secondary hydroxyls (toward the wider side) was 

25 not favorable (also referred to as the 'incorrect' orientation as opposed to the 
'correct' orientation). Clusters having the guest in this orientation are very poorly 
represented. Furthermore, in simulation I, benzyl alcohol initially binds in the 
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'incorrect' orientation, but after approximately 65 ps, the guest molecule leaves 
the host's cavity, re-binds and stays in the 'correct' conformation for the next 425 
ps simulation time without interruptions. Simulation II trajectory contains also 
only brief regions of 'incorrect' orientations even though competitive replacement 
5 of a benzyl alcohol in cyclodextrin ? s cavity by another benzyl alcohol molecule 
occurs in both, 'correct' and 'incorrect' orientations. 

Analysis of the SGMD trajectory suggests that inclusion complex formation 
was hydrophobically driven, since guest binding appears to effect primarily the 
exposure of the non-polar rather than the polar part of the host-guest system to the 

1 0 solvent. Hence our data suggest that burying of the aromatic hydrophobic moiety 
of benzyl alcohol makes guest binding favorable, as opposed to host-guest 
hydrogen bonding. 

As illustrated in this example, the invention provides for the first time a 
computational approach to determine the binding mode of an aromatic guest with 

1 5 cyclodextrin which does not incorporate a priori information regarding the guest' s 
orientation or position with respect to the host. One of the two major binding 
modes determined agrees well with the crystal structure available for the studied 
complex as illustrated in Figure 31. The computational results are also 
informative of features characteristic of the in solution structure of the 

20 cyclodextrin complex. 

This example also demonstrates the efficiency of the SGMD method used 
with a widely accepted force field for applications/cases in which rapid 
exploration of the available conformational space was essential. 

While the invention has been described in terms of preferred embodiments, 

25 the skilled artisan will appreciate that various modifications, substitutions, 
omissions and changes may be made without departing from the spirit thereof. 
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Accordingly, it is intended that the scope of the present invention be limited solely 
by the scope of the following claims, including equivalents thereof. 



